Introduction

Two sample Mendelian randomisation (2SMR) is a method to estimate the causal effect of an exposure on an outcome using only summary statistics from genome wide association studies (GWAS). Though conceptually straightforward, there are a number of steps that are required to perform the analysis properly, and they can be cumbersome. The TwoSampleMR package aims to make this easy by combining three important components

The general principles (Davey Smith and Ebrahim 2003; Davey Smith and Hemani 2014), and statistical methods (Pierce and Burgess 2013; Bowden, Davey Smith, and Burgess 2015) can be found elsewhere, here we will just outline how to use the R package.


Installation

To install directly from the GitHub repository do the following:

If you don’t have the devtools package install it from CRAN using install.packages("devtools").


Overview

The workflow for performing MR is as follows:

  1. Select instruments for the exposure (perform LD clumping if necessary)
  2. Extract the instruments from the MR Base GWAS database for the outcomes of interest
  3. Harmonise the effect sizes for the instruments on the exposures and the outcomes to be each for the same reference allele
  4. Perform MR analysis, sensitivity analyses, create plots, compile reports

A diagramatic overview is shown here:

here

here

A basic analysis, e.g. the causal effect of body mass index on coronary heart disease, looks like this:

Each step is documented in detail below.

Authentication

In order to perform any commands that access data in the MR Base database, you must authenticate the request using OAuth2.0 authentication, using a Google account. For example, requesting a list of studies in the MR Base database:

## Adding mrbase.oauth to .gitignore
## Waiting for authentication in browser...
## Press Esc/Ctrl + C to abort
## Authentication complete.
## Token cache file: mrbase.oauth

requires authentication.

If you are using a desktop computer this will open a web browser, navigating to a page that asks you to sign in to your Google account and authenticate access to MR Base.

If you are using a server which doesn’t have a graphic user interface then it will provide a link and a code to enter into a browser to authenticate access.

This function creates a new file in your working directory called mrbase.oauth. If you are using R in a working directory that does not have write permissions then this command will fail, please navigate to a directory that does have write permissions.

If you need to run this in a non-interactive script then you can generate the mrbase.oauth file on an interactive computer, copy that file to the working directory that R will be running from, and then run a batch (non-interactive).

WARNING: As of 2017-11-22 this behaviour has changed. Previously the authentication file was a hidden file called .httr-oauth. Everything works exactly the same except this file is now called mrbase.oauth.


Exposure data

A data frame of the instruments for an exposure is required. Each line has the information for one SNP for one exposure. The minimum information required for MR analysis is the following:

Other information that is useful for MR can also be provided:

You can also provide the following extra information:

Reading in from a file

The data can be read in from a text file using the read_exposure_data function. The file must have a header with column names corresponding to the columns described above.

Example 1: The default column names are used

An example of a text file with the default column names is provided as part of the package, the first few rows look like this:

Phenotype SNP beta se effect_allele other_allele eaf pval units gene samplesize
BMI rs10767664 0.19 0.0306122448979592 A T 0.78 5e-26 kg/m2 BDNF 225238
BMI rs13078807 0.1 0.0204081632653061 G A 0.2 4e-11 kg/m2 CADM2 221431
BMI rs1514175 0.07 0.0204081632653061 A G 0.43 8e-14 kg/m2 TNNI3K 207641
BMI rs1558902 0.39 0.0204081632653061 A T 0.42 5e-120 kg/m2 FTO 222476
BMI rs10968576 0.11 0.0204081632653061 G A 0.31 3e-13 kg/m2 LRRN6C 247166
BMI rs2241423 0.13 0.0204081632653061 G A 0.78 1e-18 kg/m2 LBXCOR1 227886

The exact path to the file will be different on everyone’s computer, but it can be located like this:

You can read the data in like this:

##          SNP beta.exposure se.exposure effect_allele.exposure
## 1 rs10767664          0.19  0.03061224                      A
## 2 rs13078807          0.10  0.02040816                      G
## 3  rs1514175          0.07  0.02040816                      A
## 4  rs1558902          0.39  0.02040816                      A
## 5 rs10968576          0.11  0.02040816                      G
## 6  rs2241423          0.13  0.02040816                      G
##   other_allele.exposure eaf.exposure pval.exposure units.exposure
## 1                     T         0.78         5e-26          kg/m2
## 2                     A         0.20         4e-11          kg/m2
## 3                     G         0.43         8e-14          kg/m2
## 4                     T         0.42        5e-120          kg/m2
## 5                     A         0.31         3e-13          kg/m2
## 6                     A         0.78         1e-18          kg/m2
##   gene.exposure samplesize.exposure exposure mr_keep.exposure
## 1          BDNF              225238      BMI             TRUE
## 2         CADM2              221431      BMI             TRUE
## 3        TNNI3K              207641      BMI             TRUE
## 4           FTO              222476      BMI             TRUE
## 5        LRRN6C              247166      BMI             TRUE
## 6       LBXCOR1              227886      BMI             TRUE
##   pval_origin.exposure units.exposure_dat id.exposure data_source.exposure
## 1             reported              kg/m2      Rjbdv9             textfile
## 2             reported              kg/m2      Rjbdv9             textfile
## 3             reported              kg/m2      Rjbdv9             textfile
## 4             reported              kg/m2      Rjbdv9             textfile
## 5             reported              kg/m2      Rjbdv9             textfile
## 6             reported              kg/m2      Rjbdv9             textfile

The output from this function is a new data frame with standardised column names:

  • SNP
  • exposure
  • beta.exposure
  • se.exposure
  • effect_allele.exposure
  • other_allele.exposure
  • eaf.exposure
  • mr_keep.exposure
  • pval.exposure
  • pval_origin.exposure
  • id.exposure
  • data_source.exposure
  • units.exposure
  • gene.exposure
  • samplesize.exposure

The function attempts to match the columns to the ones it expects. It also checks that the data type is as expected.

If the required data for MR to be performed is not present (SNP name, effect size, standard error, effect allele) for a particular SNP, then the column mr_keep.exposure will be “FALSE”.

Example 2: The text file has non-default column names

If the text file does not have default column names, this can still be read in as follows. Here are the first few rows of an example:

rsid,effect,SE,a1,a2,a1_freq,p-value,Units,Gene,n
rs10767664,0.19,0.030612245,A,T,0.78,5.00E-26,kg/m2,BDNF,225238
rs13078807,0.1,0.020408163,G,A,0.2,4.00E-11,kg/m2,CADM2,221431
rs1514175,0.07,0.020408163,A,G,0.43,8.00E-14,kg/m2,TNNI3K,207641
rs1558902,0.39,0.020408163,A,T,0.42,5.00E-120,kg/m2,FTO,222476

Note that this is a CSV file, with commas separating fields. The file is located here:

To read in this data:

## No phenotype name specified, defaulting to 'exposure'.
##          SNP beta.exposure se.exposure effect_allele.exposure
## 1 rs10767664          0.19  0.03061224                      A
## 2 rs13078807          0.10  0.02040816                      G
## 3  rs1514175          0.07  0.02040816                      A
## 4  rs1558902          0.39  0.02040816                      A
## 5 rs10968576          0.11  0.02040816                      G
## 6  rs2241423          0.13  0.02040816                      G
##   other_allele.exposure eaf.exposure pval.exposure units.exposure
## 1                     T         0.78         5e-26          kg/m2
## 2                     A         0.20         4e-11          kg/m2
## 3                     G         0.43         8e-14          kg/m2
## 4                     T         0.42        5e-120          kg/m2
## 5                     A         0.31         3e-13          kg/m2
## 6                     A         0.78         1e-18          kg/m2
##   gene.exposure samplesize.exposure exposure mr_keep.exposure
## 1          BDNF              225238 exposure             TRUE
## 2         CADM2              221431 exposure             TRUE
## 3        TNNI3K              207641 exposure             TRUE
## 4           FTO              222476 exposure             TRUE
## 5        LRRN6C              247166 exposure             TRUE
## 6       LBXCOR1              227886 exposure             TRUE
##   pval_origin.exposure units.exposure_dat id.exposure data_source.exposure
## 1             reported              kg/m2      lnCcB5             textfile
## 2             reported              kg/m2      lnCcB5             textfile
## 3             reported              kg/m2      lnCcB5             textfile
## 4             reported              kg/m2      lnCcB5             textfile
## 5             reported              kg/m2      lnCcB5             textfile
## 6             reported              kg/m2      lnCcB5             textfile

If the Phenotype column is not provided (as is the case in this example) then it will assume that the phenotype’s name is simply “exposure”. This is entered in the exposure column. It can be renamed manually:

Using an existing data frame

If the data already exists as a data frame in R then it can be converted into the correct format using the format_data function. For example, here is some randomly created data:

##   SNP beta se effect_allele
## 1 rs1    1  1             A
## 2 rs2    2  2             T

This can be formatted like so:

## No phenotype name specified, defaulting to 'exposure'.
## Warning in format_data(random_df, type = "exposure"): The following columns are not present but are helpful for harmonisation
## other_alleleeaf
## Warning in format_data(random_df, type = "exposure"): effect_allele column
## is not character data. Coercing...
## Inferring p-values
##   SNP beta.exposure se.exposure effect_allele.exposure exposure
## 1 rs1             1           1                      A exposure
## 2 rs2             2           2                      T exposure
##   mr_keep.exposure pval.exposure pval_origin.exposure id.exposure
## 1             TRUE     0.1586553             inferred      EBJaZN
## 2             TRUE     0.1586553             inferred      EBJaZN
##   other_allele.exposure eaf.exposure
## 1                    NA           NA
## 2                    NA           NA

Obtaining instruments from existing catalogues

A number of sources of instruments have already been curated and are available for use in MR Base. They are provided as data objects in the MRInstruments package. To install:

This package contains a number of data.frames, each of which is a repository of SNP-trait associations. How to access the data frames is detailed below:

GWAS catalog

The NHGRI-EBI GWAS catalog contains a catalog of significant associations obtained from GWASs. This version of the data is filtered and harmonised to contain associations that have the required data to perform MR, to ensure that the units used to report effect sizes from a particular study are all the same, and other data cleaning operations.

To use the GWAS catalog:

##                                     Phenotype_simple
## 1 β2-Glycoprotein I (β2-GPI) plasma levels
## 2 β2-Glycoprotein I (β2-GPI) plasma levels
## 3 β2-Glycoprotein I (β2-GPI) plasma levels
## 4 β2-Glycoprotein I (β2-GPI) plasma levels
## 5 β2-Glycoprotein I (β2-GPI) plasma levels
## 6 β2-Glycoprotein I (β2-GPI) plasma levels
##           MAPPED_TRAIT_EFO                 MAPPED_TRAIT_EFO_URI
## 1 glycoprotein measurement http://www.ebi.ac.uk/efo/EFO_0004555
## 2 glycoprotein measurement http://www.ebi.ac.uk/efo/EFO_0004555
## 3 glycoprotein measurement http://www.ebi.ac.uk/efo/EFO_0004555
## 4 glycoprotein measurement http://www.ebi.ac.uk/efo/EFO_0004555
## 5 glycoprotein measurement http://www.ebi.ac.uk/efo/EFO_0004555
## 6 glycoprotein measurement http://www.ebi.ac.uk/efo/EFO_0004555
##          Initial_sample_description Replication_sample_description
## 1 306 European ancestry individuals                           <NA>
## 2 306 European ancestry individuals                           <NA>
## 3 306 European ancestry individuals                           <NA>
## 4 306 European ancestry individuals                           <NA>
## 5 306 European ancestry individuals                           <NA>
## 6 306 European ancestry individuals                           <NA>
##   STUDY.ACCESSION
## 1      GCST001800
## 2      GCST001800
## 3      GCST001800
## 4      GCST001800
## 5      GCST001800
## 6      GCST001800
##                                                            Phenotype
## 1 &beta;2-Glycoprotein I (&beta;2-GPI) plasma levels (unit decrease)
## 2 &beta;2-Glycoprotein I (&beta;2-GPI) plasma levels (unit decrease)
## 3 &beta;2-Glycoprotein I (&beta;2-GPI) plasma levels (unit decrease)
## 4 &beta;2-Glycoprotein I (&beta;2-GPI) plasma levels (unit increase)
## 5 &beta;2-Glycoprotein I (&beta;2-GPI) plasma levels (unit increase)
## 6 &beta;2-Glycoprotein I (&beta;2-GPI) plasma levels (unit increase)
##   Phenotype_info PubmedID         Author Year        SNP chr bp_ens_GRCh38
## 1                23279374 Athanasiadis G 2013 rs10048158  17      66240200
## 2                23279374 Athanasiadis G 2013   rs193741   5      25692864
## 3                23279374 Athanasiadis G 2013  rs4925295  20      61806743
## 4                23279374 Athanasiadis G 2013 rs11190179  10      99605556
## 5                23279374 Athanasiadis G 2013  rs2319125  17      66102427
## 6                23279374 Athanasiadis G 2013  rs2647528  11       9087740
##     Region     gene   Gene_ens effect_allele other_allele  beta        se
## 1  17q24.2     APOH       APOH             C            T -0.41 0.0838165
## 2   5p14.1    CDH10 CDH10,CDH9             A            G -0.64 0.1346398
## 3 20q13.33     CDH4       CDH4             A            G -0.52 0.1093948
## 4  10q24.2 SLC25A28   SLC25A28             G            A  0.65 0.1367435
## 5  17q24.1   CEP112     CEP112             C            T  0.42 0.0858608
## 6  11p15.4   SCUBE2     SCUBE2             C            T  0.59 0.1151568
##    pval         units  eaf date_added_to_MRBASE
## 1 1e-06 unit decrease 0.50           2017-03-20
## 2 2e-06 unit decrease 0.14           2017-03-20
## 3 2e-06 unit decrease 0.16           2017-03-20
## 4 2e-06 unit increase 0.12           2017-03-20
## 5 1e-06 unit increase 0.46           2017-03-20
## 6 3e-07 unit increase 0.17           2017-03-20

For example, to obtain instruments for body mass index using the Speliotes et al 2010 study:

Metabolites

Independent top hits from GWASs on 121 metabolites in whole blood are stored in the metab_qtls data object. Use ?metab_qtls to get more information.

##   phenotype chromosome  position       SNP effect_allele other_allele
## 1     AcAce          8   9181395 rs2169387             G            A
## 2     AcAce         11 116648917  rs964184             C            G
## 3       Ace          6  12042473 rs6933521             C            T
## 4       Ala          2  27730940 rs1260326             C            T
## 5       Ala          2  65220910 rs2160387             C            T
## 6       Ala         12  47201814 rs4554975             G            A
##        eaf      beta       se     pval n_studies     n
## 1 0.870251  0.085630 0.015451 3.61e-08        11 19257
## 2 0.857715 -0.096027 0.014624 6.71e-11        11 19261
## 3 0.120256 -0.091667 0.015885 8.10e-09        14 24742
## 4 0.638817 -0.104582 0.009940 7.40e-26        13 22569
## 5 0.403170 -0.071001 0.009603 1.49e-13        14 24793
## 6 0.644059 -0.069135 0.009598 6.12e-13        14 24792

For example, to obtain instruments for the Alanine:

Proteins

Independent top hits from GWASs on 47 protein levels in whole blood are stored in the proteomic_qtls data object. Use ?proteomic_qtls to get more information.

##   analyte chr  position        SNP  gene location annotation other_allele
## 1   CFHR1   1 196698945 rs12144939   CFH      cis   missense            T
## 2    IL6r   1 154425456 rs12126142  IL6R      cis   missense            A
## 3   ApoA4  11 116677723  rs1263167 APOA4      cis intergenic            G
## 4    SELE   9 136149399   rs507666   ABO    trans   intronic            A
## 5 FetuinA   3 186335941  rs2070633  AHSG      cis   missense            T
## 6     ACE  17  61566031     rs4343   ACE      cis synonymous            A
##   effect_allele   eaf   maf      pval   beta         se
## 1             G 0.643 0.357 8.99e-143 -1.108 0.04355258
## 2             G 0.608 0.392 1.81e-106  0.850 0.03878364
## 3             A 0.803 0.197  2.64e-54 -0.919 0.05922332
## 4             G 0.809 0.191  1.01e-52 -0.882 0.05771545
## 5             C 0.676 0.324  2.88e-44 -0.629 0.04506925
## 6             G 0.508 0.492  6.66e-44  0.493 0.03547679

For example, to obtain instruments for the ApoH protein:

## Warning in format_data(proteomic_qtls_subset, type = type, phenotype_col =
## "analyte"): effect_allele column is not character data. Coercing...
## Warning in format_data(proteomic_qtls_subset, type = type, phenotype_col =
## "analyte"): other_allele column is not character data. Coercing...

Gene expression levels

Independent top hits from GWASs on 32432 gene identifiers and in 44 tissues are available from the GTEX study in gtex_eqtl. Use ?gtex_eqtl to get more information.

##                 tissue     gene_name gene_start         SNP snp_position
## 1 Adipose Subcutaneous RP4-669L17.10   1:317720   rs2519065     1:787151
## 2 Adipose Subcutaneous RP11-206L10.1   1:661611  rs11804171     1:723819
## 3 Adipose Subcutaneous RP11-206L10.3   1:677193 rs149110718     1:759227
## 4 Adipose Subcutaneous RP11-206L10.2   1:700306 rs148649543     1:752796
## 5 Adipose Subcutaneous RP11-206L10.9   1:714150  rs12184279     1:717485
## 6 Adipose Subcutaneous RP11-206L10.8   1:736259  rs10454454     1:754954
##   effect_allele other_allele      beta        se        pval   n
## 1             A            G  0.551788 0.0747180 2.14627e-12 298
## 2             A            T -0.917475 0.1150060 4.99967e-14 298
## 3             T            C  0.807571 0.1776530 8.44694e-06 298
## 4             T            C  0.745393 0.0958531 1.82660e-13 298
## 5             A            C  1.927250 0.2247390 9.55098e-16 298
## 6             A            G  1.000400 0.1787470 5.61079e-08 298

For example, to obtain instruments for the IRAK1BP1 gene expression levels in subcutaneous adipose tissue:

## Warning in format_data(gtex_eqtl_subset, type = type, phenotype_col = type, : The following columns are not present but are helpful for harmonisation
## eaf
## Inferring p-values

DNA methylation levels

Independent top hits from GWASs on 0 DNA methylation levels in whole blood across 5 time points are available from the ARIES study in aries_mqtl. Use ?aries_mqtl to get more information.

##          SNP timepoint        cpg    beta        pval         se snp_chr
## 1 esv2656832         1 cg21826606  0.3459 1.60408e-26 0.03265336       1
## 2 esv2658098         1 cg22681495 -0.6263 1.55765e-66 0.03643240      15
## 3 esv2660043         1 cg24276624 -0.5772 3.16370e-26 0.05481823      11
## 4 esv2660043         1 cg11157765 -0.5423 1.33928e-22 0.05583777      11
## 5 esv2660673         1 cg05832925 -0.5919 2.88011e-50 0.03982467      11
## 6 esv2660769         1 cg05859533 -0.6224 1.49085e-58 0.03868158      16
##    snp_pos effect_allele other_allele    eaf   sex   age    units
## 1 25591901             I            R 0.3974 mixed Birth SD units
## 2 86057007             D            R 0.2076 mixed Birth SD units
## 3 69982552             D            R 0.1450 mixed Birth SD units
## 4 69982552             D            R 0.1450 mixed Birth SD units
## 5 74024905             D            R 0.1671 mixed Birth SD units
## 6 57725395             D            R 0.2136 mixed Birth SD units
##   island_location cpg_chr  cpg_pos    gene gene_location cis_trans
## 1         N_Shore       1 25593055                             cis
## 2                      15 86058755  AKAP13          Body       cis
## 3                      11 69982941    ANO1          Body       cis
## 4                      11 69982996    ANO1          Body       cis
## 5         S_Shelf      11 74026371                             cis
## 6                      16 57727230 CCDC135       TSS1500       cis

For example, to obtain instruments for cg25212131 CpG DNA methylation levels in at birth:

MR Base GWAS database

The MR Base GWAS database contains the entire summary statistics for hundreds of GWASs. You can use this database to define the instruments for a particular exposure. You can also use this database to obtain the effects for constructing polygenic risk scores using different p-value thresholds.

For example, to obtain details about the available GWASs do the following:

## Token cache file: mrbase.oauth
##   access      author    category consortium
## 1 public   Dastani Z Risk factor   ADIPOGen
## 2 public   Jostins L     Disease     IIBDGC
## 3 public  Randall JC Risk factor      GIANT
## 4 public       Okbay Risk factor      SSGAC
## 5 public       Okbay Risk factor      SSGAC
## 6 public Kilpelainen Risk factor       <NA>
##                                                                                filename
## 1                                  adipogen.discovery.eur_.meta_.public.release.txt.tab
## 2                                                 CD.gwas_ichip_meta_release.txt.gz.tab
## 3 GIANT_Randall2013PlosGenet_stage1_publicrelease_HapMapCeuFreq_HIPadjBMI_MEN_N.txt.tab
## 4                                                                    DS_Full.txt.gz.tab
## 5                                                              EduYears_Main.txt.gz.tab
## 6                                                       Leptin_Adjusted_for_BMI.txt.tab
##     id mr ncase ncontrol
## 1    1  1    NA       NA
## 2   10  1 14763    15977
## 3  100  1    NA       NA
## 4 1000  1    NA       NA
## 5 1001  1    NA       NA
## 6 1002  1    NA       NA
##                                                      note    nsnp
## 1                                                    <NA> 2675209
## 2                                                    <NA>   13898
## 3                                        Adjusted for BMI 2725796
## 4                                                    <NA> 6524475
## 5                                                    <NA> 8146841
## 6 Adjusted for BMI; effect allele frequencies are missing 2474010
##                                                                                                        path
## 1                                                   /projects/MRC-IEU/publicdata/GWAS_summary_data/ADIPOGEN
## 2                                                        /projects/MRC-IEU/publicdata/GWAS_summary_data/IBD
## 3                                       /projects/MRC-IEU/publicdata/GWAS_summary_data/GIANT_2010_2012_2013
## 4                               /projects/MRC-IEU/research/data/evidencehub/summary/gwas/raw/SSGAC_27089181
## 5 /projects/MRC-IEU/research/data/evidencehub/summary/gwas/raw/SSGAC_education_educational_attainment_2016/
## 6                           /projects/MRC-IEU/research/data/evidencehub/summary/gwas/raw/Kilpelainen_Leptin
##       pmid population priority sample_size     sd               sex
## 1 22479202      Mixed        1       39883 0.5700 Males and females
## 2 23128233   European        1       30740     NA Males and females
## 3 23754948   European       15       60586 8.4548             Males
## 4 27089181   European        1      161460     NA Males and females
## 5 27225129   European        1      293723 3.7100 Males and females
## 6 26833098   European        1       32161     NA Males and females
##                  subcategory               trait       unit year
## 1                    Protein         Adiponectin  ln(mg/dL) 2012
## 2  Autoimmune / inflammatory     Crohn's disease   log odds 2012
## 3             Anthropometric   Hip circumference    SD (cm) 2013
## 4 Psychiatric / neurological Depressive symptoms         SD 2016
## 5                  Education  Years of schooling SD (years) 2016
## 6                    Hormone              Leptin  log ng/ml 2016

For information about authentication see the Authentication section.

The available_outcomes function returns a table of all the available studies in the database. Each study has a unique ID. e.g.

##                 trait   id
## 1         Adiponectin    1
## 2     Crohn's disease   10
## 3   Hip circumference  100
## 4 Depressive symptoms 1000
## 5  Years of schooling 1001
## 6              Leptin 1002

To extract instruments for a particular trait using a particular study, for example to obtain SNPs for body mass index using the Locke et al 2015 GIANT study, you specify the study ID as follows:

## Requesting default values. Extracting from pre-clumped data
## Warning in extract_instruments(outcomes = 2): From version 0.4.2 the
## exposure name format has changed.

This returns a set of LD clumped SNPs that are GWAS significant for BMI. You can specify various parameters for this function:

  • p1 = P-value threshold for keeping a SNP
  • clump = Whether or not to return independent SNPs only (default=TRUE)
  • r2 = The maximum LD R-square allowed between returned SNPs
  • kb = The distance in which to search for LD R-square values

By changing changing the p1 parameter it is possible to obtain SNP effects for constructing polygenic risk scores.

Clumping

For standard two sample MR it is important to ensure that the instruments for the exposure are independent. Once instruments have been identified for an exposure variable, MR Base can be used to perform clumping.

The European samples from the 1000 genomes project are used to estimate LD between SNPs. You can provide a list of SNP IDs, the SNPs will be extracted from 1000 genomes data, LD calculated between them, and amongst those SNPs that have LD R-square above the specified threshold only the SNP with the lowest P-value will be retained. To do this, use the following command:

## Warning: since v0.4.2 the default r2 value for clumping has changed from 0.01 to 0.001
## Clumping lnCcB5, 30 SNPs
## Removing the following SNPs due to LD with other SNPs:
## rs1514175
## rs7359397
## rs3810291

The clump_data command takes any data frame that has been formatted to be an exposure data type of data frame. Note that for the instruments in the R/MRInstruments package the SNPs are already LD clumped.


Outcome data

Once instruments for the exposure trait have been specified, those SNPs need to be extracted from the outcome trait.

Available studies in MR Base

MR Base contains complete GWAS summary statistics from a large number of studies. To obtain details about the available GWASs do the following:

## Token cache file: mrbase.oauth
##   access      author    category consortium
## 1 public   Dastani Z Risk factor   ADIPOGen
## 2 public   Jostins L     Disease     IIBDGC
## 3 public  Randall JC Risk factor      GIANT
## 4 public       Okbay Risk factor      SSGAC
## 5 public       Okbay Risk factor      SSGAC
## 6 public Kilpelainen Risk factor       <NA>
##                                                                                filename
## 1                                  adipogen.discovery.eur_.meta_.public.release.txt.tab
## 2                                                 CD.gwas_ichip_meta_release.txt.gz.tab
## 3 GIANT_Randall2013PlosGenet_stage1_publicrelease_HapMapCeuFreq_HIPadjBMI_MEN_N.txt.tab
## 4                                                                    DS_Full.txt.gz.tab
## 5                                                              EduYears_Main.txt.gz.tab
## 6                                                       Leptin_Adjusted_for_BMI.txt.tab
##     id mr ncase ncontrol
## 1    1  1    NA       NA
## 2   10  1 14763    15977
## 3  100  1    NA       NA
## 4 1000  1    NA       NA
## 5 1001  1    NA       NA
## 6 1002  1    NA       NA
##                                                      note    nsnp
## 1                                                    <NA> 2675209
## 2                                                    <NA>   13898
## 3                                        Adjusted for BMI 2725796
## 4                                                    <NA> 6524475
## 5                                                    <NA> 8146841
## 6 Adjusted for BMI; effect allele frequencies are missing 2474010
##                                                                                                        path
## 1                                                   /projects/MRC-IEU/publicdata/GWAS_summary_data/ADIPOGEN
## 2                                                        /projects/MRC-IEU/publicdata/GWAS_summary_data/IBD
## 3                                       /projects/MRC-IEU/publicdata/GWAS_summary_data/GIANT_2010_2012_2013
## 4                               /projects/MRC-IEU/research/data/evidencehub/summary/gwas/raw/SSGAC_27089181
## 5 /projects/MRC-IEU/research/data/evidencehub/summary/gwas/raw/SSGAC_education_educational_attainment_2016/
## 6                           /projects/MRC-IEU/research/data/evidencehub/summary/gwas/raw/Kilpelainen_Leptin
##       pmid population priority sample_size     sd               sex
## 1 22479202      Mixed        1       39883 0.5700 Males and females
## 2 23128233   European        1       30740     NA Males and females
## 3 23754948   European       15       60586 8.4548             Males
## 4 27089181   European        1      161460     NA Males and females
## 5 27225129   European        1      293723 3.7100 Males and females
## 6 26833098   European        1       32161     NA Males and females
##                  subcategory               trait       unit year
## 1                    Protein         Adiponectin  ln(mg/dL) 2012
## 2  Autoimmune / inflammatory     Crohn's disease   log odds 2012
## 3             Anthropometric   Hip circumference    SD (cm) 2013
## 4 Psychiatric / neurological Depressive symptoms         SD 2016
## 5                  Education  Years of schooling SD (years) 2016
## 6                    Hormone              Leptin  log ng/ml 2016

For information about authentication see the Authentication section.

The available_outcomes function returns a table of all the available studies in the database. Each study has a unique ID. e.g.

##                 trait   id
## 1         Adiponectin    1
## 2     Crohn's disease   10
## 3   Hip circumference  100
## 4 Depressive symptoms 1000
## 5  Years of schooling 1001
## 6              Leptin 1002

Extracting particular SNPs from particular studies

If we want to perform MR of BMI against coronary heart disease, we need to identify the SNPs that influence the BMI, and then extract those SNPs from a GWAS on coronary heart disease.

We have already extracted the 27 SNP effects for BMI:

##          SNP beta.exposure se.exposure effect_allele.exposure
## 1 rs10767664          0.19  0.03061224                      A
## 2 rs13078807          0.10  0.02040816                      G
## 3  rs1558902          0.39  0.02040816                      A
## 4 rs10968576          0.11  0.02040816                      G
## 5  rs2241423          0.13  0.02040816                      G
## 6  rs3817334          0.06  0.02040816                      T
##   other_allele.exposure eaf.exposure pval.exposure units.exposure
## 1                     T         0.78         5e-26          kg/m2
## 2                     A         0.20         4e-11          kg/m2
## 3                     T         0.42        5e-120          kg/m2
## 4                     A         0.31         3e-13          kg/m2
## 5                     A         0.78         1e-18          kg/m2
## 6                     C         0.41         2e-12          kg/m2
##   gene.exposure samplesize.exposure exposure mr_keep.exposure
## 1          BDNF              225238      BMI             TRUE
## 2         CADM2              221431      BMI             TRUE
## 3           FTO              222476      BMI             TRUE
## 4        LRRN6C              247166      BMI             TRUE
## 5       LBXCOR1              227886      BMI             TRUE
## 6        CUGBP1              209051      BMI             TRUE
##   pval_origin.exposure units.exposure_dat id.exposure data_source.exposure
## 1             reported              kg/m2      lnCcB5             textfile
## 2             reported              kg/m2      lnCcB5             textfile
## 3             reported              kg/m2      lnCcB5             textfile
## 4             reported              kg/m2      lnCcB5             textfile
## 5             reported              kg/m2      lnCcB5             textfile
## 6             reported              kg/m2      lnCcB5             textfile

We now need to find a suitable GWAS for coronary heart disease. We can search the available studies:

##      access      author category        consortium
## 698  public       Peden  Disease               C4D
## 809  public      Nikpay  Disease CARDIoGRAMplusC4D
## 920  public Schunkert H  Disease        CARDIoGRAM
## 1031 public    Deloukas  Disease CARDIoGRAMplusC4D
## 1618 public       Neale       NA         Neale Lab
##                                                  filename        id mr
## 698         C4D_CAD_DISCOVERY_METAANALYSIS_UPDATE.TXT.tab         6  1
## 809                        cad.add.160614.website.txt.tab         7  1
## 920                    CARDIoGRAM_GWAS_RESULTS.txt.gz.tab         8  1
## 1031 cardiogramplusc4d_180814_update_data.txt.tab.all_pos         9  1
## 1618                  Allele.reorder.I25.assoc.tsv.gz.tab UKB-a:534  1
##      ncase ncontrol
## 698  15420    15062
## 809  60801   123504
## 920  22233    64762
## 1031 63746   130681
## 1618  8755   328444
##                                                                                                   note
## 698                                                                                               <NA>
## 809                                                                                               <NA>
## 920                                                                                               <NA>
## 1031                                                                                              <NA>
## 1618 Data from http://www.nealelab.is/blog/2017/9/11/details-and-considerations-of-the-uk-biobank-gwas
##          nsnp
## 698    540233
## 809   9455779
## 920   2420361
## 1031    79129
## 1618 10894596
##                                                                               path
## 698                      /projects/MRC-IEU/publicdata/GWAS_summary_data/cardiogram
## 809                      /projects/MRC-IEU/publicdata/GWAS_summary_data/cardiogram
## 920                      /projects/MRC-IEU/publicdata/GWAS_summary_data/cardiogram
## 1031                     /projects/MRC-IEU/publicdata/GWAS_summary_data/cardiogram
## 1618 /projects/MRC-IEU/research/data/ukbiobank/summary/gwas/dev/cleaned_597_files/
##          pmid population priority sample_size sd               sex
## 698  21378988      Mixed        3       30482 NA Males and females
## 809  26343387      Mixed        1      184305 NA Males and females
## 920  21378990   European        2       86995 NA Males and females
## 1031 23202125      Mixed        1      194427 NA Males and females
## 1618        0  Europrean        1      337199  0 Males and Females
##         subcategory
## 698  Cardiovascular
## 809  Cardiovascular
## 920  Cardiovascular
## 1031 Cardiovascular
## 1618             NA
##                                                            trait     unit
## 698                                       Coronary heart disease log odds
## 809                                       Coronary heart disease log odds
## 920                                       Coronary heart disease log odds
## 1031                                      Coronary heart disease log odds
## 1618 Diagnoses - main ICD10: I25 Chronic ischaemic heart disease       NA
##      year
## 698  2011
## 809  2015
## 920  2011
## 1031 2013
## 1618 2017

The most recent CARDIOGRAM GWAS is ID number 7. We can extract the BMI SNPs from this GWAS as follows:

## Extracting data for 27 SNP(s) from 1 GWAS(s)
## Token cache file: mrbase.oauth
## Warning in format_d(d): From version 0.4.2 the outcome format has
## changed. You can find the deprecated version of the output name in
## outcome.deprecated

The extract_outcome_data is a flexible function. The snps argument only requires an array of rsIDs, and the outcomes variable can be a vector of outcomes. e.g. chd_out_dat(c("rs234", "rs17097147"), c(2, 7)) will extract the two SNPs from each of the outcomes 2 and 7.

LD proxies

By default if a particular requested SNP is not present in the outcome GWAS then a SNP (proxy) that is in LD with the requested SNP (target) will be searched for instead. LD proxies are defined using 1000 genomes European sample data. The effect of the proxy SNP on the outcome is returned, along with the proxy SNP, the effect allele of the proxy SNP, and the corresponding allele (in phase) for the target SNP.

The parameters for handling LD proxies are as follows:

  • proxies = TRUE or FALSE (TRUE by default)
  • rsq = numeric value of minimum rsq to find a proxy. Default is 0.8, minimum is 0.6
  • palindromes = Allow palindromic SNPs? Default is 1 (yes)
  • maf_threshold = If palindromes allowed then what is the maximum minor allele frequency of palindromes allowed? Default is 0.3.

Using local GWAS summary data

If you have GWAS summary data that is not present in MR Base, this can still be to perform analysis, though the LD proxy functionality is not currently available.

Supposing there is a GWAS summary file called “gwas_summary.csv” with e.g. 2 million rows and it looks like this:

rsid,effect,SE,a1,a2,a1_freq,p-value,Units,Gene,n
rs10767664,0.19,0.030612245,A,T,0.78,5.00E-26,kg/m2,BDNF,225238
rs13078807,0.1,0.020408163,G,A,0.2,4.00E-11,kg/m2,CADM2,221431
rs1514175,0.07,0.020408163,A,G,0.43,8.00E-14,kg/m2,TNNI3K,207641
rs1558902,0.39,0.020408163,A,T,0.42,5.00E-120,kg/m2,FTO,222476
...
...

To extract the exposure SNPs from this data, we would use the following command:

This returns an outcome data frame with only the SNPs that were requested (if those SNPs were present in the “gwas_summary.csv” file).

Outcome data format

The extract_outcome_data function returns a table of SNP effects for the requested SNPs on the requested outcomes. The format of the data is similar to the exposure data format, except the main columns are as follows:

  • SNP
  • beta.outcome
  • se.outcome
  • samplesize.outcome
  • ncase.outcome
  • ncontrol.outcome
  • pval.outcome
  • eaf.outcome
  • effect_allele.outcom
  • other_allele.outcome
  • units.outcome
  • outcome
  • consortium.outcome
  • year.outcome
  • pmid.outcome
  • id.outcome
  • originalname.outcome
  • proxy.outcome
  • target_snp.outcome
  • proxy_snp.outcome
  • target_a1.outcome
  • target_a2.outcome
  • proxy_a1.outcome
  • proxy_a2.outcome
  • mr_keep.outcome
  • data_source.outcome

Harmonise data

The exposure data and outcome data are now obtained, but it is important to harmonise the effects. This means that the effect of a SNP on the exposure and the effect of that SNP on the outcome must each correspond to the same allele.

To harmonise the exposure and outcome data, do the following:

## Harmonising BMI (lnCcB5) and Coronary heart disease || id:7 (7)

This creates a new data frame that has the exposure data and outcome data combined.

If there were 3 exposure traits and 3 outcome traits then there will be 9 sets of harmonisations being performed - harmonising the SNP effects of exposure trait 1 against outcome trait 1; exposure trait 1 against outcome trait 2; and so on.

Dealing with strand issues

Recent GWASs typically present the effects of a SNP in reference to the allele on the forward strand. But as reference panels are updated the forward strand sometimes changes, and GWASs from a few years ago aren’t guaranteed to be using forward strand conventions.

Some examples are shown below:

Correct, unambigious

exposure effect = 0.5
effect allele = A
other allele = G

outcome effect = 0.05
effect allele = A
other allele = G

Here the effect allele on the exposure and the outcome is the same

Incorrect reference, unambigious

exposure effect = 0.5
effect allele = A
other allele = G

outcome effect = -0.05
effect allele = C
other allele = T

Here the outcome GWAS is presenting the effect for the alternate allele on the reverse strand. We need to flip the outcome effect to 0.05 to correspond to the same allele as the exposure GWAS on the forward strand.

Ambiguous

exposure effect = 0.5
effect allele = A
other allele = G

outcome effect = -0.05
effect allele = A
other allele = C

Here the alleles do not correspond for the same SNP, so this SNP will be discarded from the analysis.

Palindromic SNP, inferrable

exposure effect = 0.5
effect allele = A
other allele = T
effect allele frequency = 0.11

outcome effect = -0.05
effect allele = A
other allele = T
effect allele frequency = 0.91

Here the alleles correspond, but it is a palindromic SNP, such that the alleles on the forward strand are the same as on the reverse strand (A/T on forward is T/A on the reverse). However, the allele frequency of the effect allele gives us information - if the outcome effect allele (A) were on the forward strand we would expect it to have a low allele frequency, but given it has a high frequency (0.91) we infer that the outcome GWAS is presenting the effect on the reverse strand for the alternative allele. We would flip the effect to 0.05 for the outcome GWAS.

Palindromic SNP, not inferrable

exposure effect = 0.5
effect allele = A
other allele = T
effect allele frequency = 0.50

outcome effect = -0.05
effect allele = A
other allele = T
effect allele frequency = 0.50

This is similar to the above, except the allele frequency no longer gives us information about the strand. We would discard this SNP. This is done for any palindromic SNPs that have minor allele frequency above 0.42.

Options

There are three options to harmonising the data.

  1. Assume all alleles are presented on the forward strand
  2. Try to infer the forward strand alleles using allele frequency information
  3. Correct the strand for non-palindromic SNPs, but drop all palindromic SNPs

By default, the harmonise_data function uses option 2, but this can be modified using the action argument, e.g. harmonise_data(exposure_dat, outcome_dat, action=3).

Drop duplicate exposure-outcome summary sets

After data harmonisation, users may find that their dataset contains duplicate exposure-outcome summary sets. This can arise, for example, when a GWAS consortium has released multiple results from separate GWAS analyses for the same trait. We recommend that users prune their datasets so that only the exposure-outcome combination with the highested expected power is retained. This can be done by selecting the exposure-outcome summary set with the largest sample size for the outcome, using the power.prune function:

This drops the duplicate summary sets with the smaller sample size for the outcome. However, if there are a large number of SNPs available to instrument an exposure, the outcome GWAS with the better SNP coverage may provide better power than the outcome GWAS with the larger sample size. This can occur, for example, if the larger outcome GWAS has used a targeted or fine-mapping genotyping array. In such instances, it may be better to prune studies on the basis of the strength of the available instruments as well as sample size. This can be done by setting size.method to FALSE in the power.prune function:

## [1] "BMI Coronary heart disease || id:7"

This procedure estimates power for each summary set, taking into account sample size and the variance explained in the exposure by the available SNPs. It then drops the duplicate summary sets with the lower expected power. The method assumes, however, that the SNP-outcome effects are log odds ratios and SNP-exposure effects are in standard deviation units.


Perform MR

Once the exposure and outcome data are harmonised, we have effects and standard errors for each instrument SNP available for the exposure and outcome traits. We can use this information to perform Mendelian randomisation. To do this, simply run:

## Analysing 'lnCcB5' on '7'
##   id.exposure id.outcome                        outcome exposure
## 1      lnCcB5          7 Coronary heart disease || id:7      BMI
## 2      lnCcB5          7 Coronary heart disease || id:7      BMI
## 3      lnCcB5          7 Coronary heart disease || id:7      BMI
## 4      lnCcB5          7 Coronary heart disease || id:7      BMI
## 5      lnCcB5          7 Coronary heart disease || id:7      BMI
##                      method nsnp          b         se         pval
## 1                  MR Egger   27 0.11385662 0.03292752 1.962323e-03
## 2           Weighted median   27 0.07940417 0.02019403 8.422051e-05
## 3 Inverse variance weighted   27 0.11391684 0.01611170 1.544411e-12
## 4               Simple mode   27 0.07510331 0.03947680 6.824145e-02
## 5             Weighted mode   27 0.08875377 0.02027617 1.738569e-04

This returns a data frame of estimates of the causal effect of the exposure on the outcome for a range of different MR methods.

If there were multiple exposures against multiple outcomes in dat, the mr() function will perform each MR method for each combination of exposure-outcome traits.

MR methods

The list of available MR methods can be obtained:

##                              obj
## 1                  mr_wald_ratio
## 2               mr_two_sample_ml
## 3            mr_egger_regression
## 4  mr_egger_regression_bootstrap
## 5               mr_simple_median
## 6             mr_weighted_median
## 7   mr_penalised_weighted_median
## 8                         mr_ivw
## 9                     mr_ivw_mre
## 10                     mr_ivw_fe
## 11                mr_simple_mode
## 12              mr_weighted_mode
## 13         mr_weighted_mode_nome
## 14           mr_simple_mode_nome
## 15                       mr_raps
##                                                         name PubmedID
## 1                                                 Wald ratio         
## 2                                         Maximum likelihood         
## 3                                                   MR Egger 26050253
## 4                                       MR Egger (bootstrap) 26050253
## 5                                              Simple median         
## 6                                            Weighted median         
## 7                                  Penalised weighted median         
## 8                                  Inverse variance weighted         
## 9  Inverse variance weighted (multiplicative random effects)         
## 10                 Inverse variance weighted (fixed effects)         
## 11                                               Simple mode         
## 12                                             Weighted mode         
## 13                                      Weighted mode (NOME)         
## 14                                        Simple mode (NOME)         
## 15                      Robust adjusted profile score (RAPS)         
##    Description use_by_default heterogeneity_test
## 1                        TRUE              FALSE
## 2                       FALSE               TRUE
## 3                        TRUE               TRUE
## 4                       FALSE              FALSE
## 5                       FALSE              FALSE
## 6                        TRUE              FALSE
## 7                       FALSE              FALSE
## 8                        TRUE               TRUE
## 9                       FALSE              FALSE
## 10                      FALSE              FALSE
## 11                       TRUE              FALSE
## 12                       TRUE              FALSE
## 13                      FALSE              FALSE
## 14                      FALSE              FALSE
## 15                      FALSE              FALSE

To perform them, they can be specified in the mr() function, e.g. to only perform MR Egger regression and Inverse variance weighted methods,

## Analysing 'lnCcB5' on '7'
##   id.exposure id.outcome                        outcome exposure
## 1      lnCcB5          7 Coronary heart disease || id:7      BMI
## 2      lnCcB5          7 Coronary heart disease || id:7      BMI
##                      method nsnp         b         se         pval
## 1                  MR Egger   27 0.1138566 0.03292752 1.962323e-03
## 2 Inverse variance weighted   27 0.1139168 0.01611170 1.544411e-12

By default, all the methods that are labelled TRUE in the use_by_default column are used by the mr() function.


Sensitivity analyses

Heterogeneity statistics

Some of the MR methods can also perform tests for heterogeneity. To obtain those statistics:

## Warning in mr_heterogeneity(dat): Prior to version 0.4.9 there was a bug
## in the IVW Q statistic estimate, leading to a slight underestimation in
## heterogeneity. This has now been resolved.
##   id.exposure id.outcome                        outcome exposure
## 1      lnCcB5          7 Coronary heart disease || id:7      BMI
## 2      lnCcB5          7 Coronary heart disease || id:7      BMI
## 3      lnCcB5          7 Coronary heart disease || id:7      BMI
##                      method        Q Q_df     Q_pval
## 1        Maximum likelihood 37.45365   26 0.06799850
## 2                  MR Egger 39.68702   25 0.03139634
## 3 Inverse variance weighted 39.68703   26 0.04185786

As with the mr() function, the mr_heterogeneity() function can take an argument to only perform heterogeneity tests using specified methods, e.g.

## Warning in mr_heterogeneity(dat, method_list = c("mr_egger_regression", :
## Prior to version 0.4.9 there was a bug in the IVW Q statistic estimate,
## leading to a slight underestimation in heterogeneity. This has now been
## resolved.
##   id.exposure id.outcome                        outcome exposure
## 1      lnCcB5          7 Coronary heart disease || id:7      BMI
## 2      lnCcB5          7 Coronary heart disease || id:7      BMI
##                      method        Q Q_df     Q_pval
## 1                  MR Egger 39.68702   25 0.03139634
## 2 Inverse variance weighted 39.68703   26 0.04185786

Horizontal pleiotropy

The intercept term in MR Egger regression can be a useful indication of whether directional horizontal pleiotropy is driving the results of an MR analysis. This can be obtained as follows:

##   id.exposure id.outcome                        outcome exposure
## 1      lnCcB5          7 Coronary heart disease || id:7      BMI
##   egger_intercept          se     pval
## 1    1.110681e-05 0.005263163 0.998333

Single SNP analysis

To obtain the MR estimates using each of the SNPs singly we can do the following:

This returns a data.frame of results that is similar to the output from mr() except it performs the analysis multiple times for each exposure-outcome combination - each time using a different single SNP to perform the analysis.

The method used to perform the single SNP MR is the Wald ratio by default, though this can be changed, e.g. to use the fixed effects meta analysis method instead:

The mr_singlesnp() function calculates the full MR using all available SNPs as well, and by default it uses the IVW and MR Egger methods. This can be specified as so:

will perform only the maximum likelihood method for the combined test.

Leave-one-out analysis

It is possible to perform a leave-one-out analysis, where the MR is performed again but leaving out each SNP in turn, to identify if a single SNP is driving the association.

By default the method used is the inverse variance weighted method, but this can be changed by using the method argument.


Plots

There are a few ways to visualise the results, listed below

Scatter plot

We can depict the relationship of the SNP effects on the exposure against the SNP effects on the outcome using a scatter plot.

## Analysing 'lnCcB5' on '7'
## Warning: Ignoring unknown aesthetics: text

A scatter plot is created for each exposure-outcome test, and stored in p1 as a list of plots. For example, to plot the first scatter plot:

And to see how many plots there are:

## [1] 1

Lines are drawn for each method used in mr(dat), the slope of the line corresponding to the estimated causal effect. To limit which lines are drawn, simply specify the desired methods, e.g. to only draw MR Egger and IVW:

## Analysing 'lnCcB5' on '7'
## Warning: Ignoring unknown aesthetics: text

It is possible to save this plot using the ggsave() function, e.g. to save as a pdf

or save as a png

See ?ggsave for more info.

Forest plot

Use the mr_forest_plot() function to compare the MR estimates using the different MR methods against the single SNP tests.

## Warning: Removed 1 rows containing missing values (geom_errorbarh).
## Warning: Removed 1 rows containing missing values (geom_point).

Here, the plot shows the causal effect as estimated using each of the SNPs on their own, and comparing against the causal effect as estimated using the methods that use all the SNPs.

To get plots that use different methods, specify them in the mr_singlesnp() function:

## Warning: Removed 1 rows containing missing values (geom_errorbarh).
## Warning: Removed 1 rows containing missing values (geom_point).

Leave-one-out plot

Use the mr_leaveoneout_plot function to visualise the leave-one-out analysis:

## Warning: Removed 1 rows containing missing values (geom_errorbarh).
## Warning: Removed 1 rows containing missing values (geom_point).

Specify the test to use e.g. mr_leaveoneout(dat, method = mr_egger_regression) to use Egger regression.

Funnel plot

Asymmetry in a funnel plot is useful for gauging the reliability of a particular MR analysis. Funnel plots can be produced using the single SNP results as follows: howzit

1-to-many forest plot

A 1-to-many MR analysis interrogates the effect of a single exposure on multiple outcomes or multiple exposures on a single outcome. The results of this analysis can be visualised using the 1-to-many forest plot, with or without stratification on a categorical variable. The function assumes the results are already in the right order for plotting. As such, users are advised to sort their results according to how they would like them to appear in the plot. Users can use their own code to do this or they can use the Sort.1.to.many() function.

Step 1: generate 1-to-many MR results

## Requesting default values. Extracting from pre-clumped data
## Warning in extract_instruments(outcomes = c(2, 100, 1032, 104, 1, 72,
## 999)): From version 0.4.2 the exposure name format has changed.
## 
##           Adiponectin || id:1            Body fat || id:999 
##                            14                            10 
##       Body mass index || id:2   Hip circumference || id:100 
##                            79                             2 
## Waist circumference || id:104   Waist-to-hip ratio || id:72 
##                             1                            31
## Extracting data for 133 SNP(s) from 1 GWAS(s)
## Token cache file: mrbase.oauth
## Warning in format_d(d): From version 0.4.2 the outcome format has
## changed. You can find the deprecated version of the output name in
## outcome.deprecated
## Harmonising Adiponectin || id:1 (1) and Coronary heart disease || id:7 (7)
## Harmonising Hip circumference || id:100 (100) and Coronary heart disease || id:7 (7)
## Harmonising Waist circumference || id:104 (104) and Coronary heart disease || id:7 (7)
## Harmonising Body mass index || id:2 (2) and Coronary heart disease || id:7 (7)
## Harmonising Waist-to-hip ratio || id:72 (72) and Coronary heart disease || id:7 (7)
## Removing the following SNPs for being palindromic with intermediate allele frequencies:
## rs9491696
## Harmonising Body fat || id:999 (999) and Coronary heart disease || id:7 (7)
## Analysing '1' on '7'
## Analysing '100' on '7'
## Analysing '104' on '7'
## Analysing '2' on '7'
## Analysing '72' on '7'
## Analysing '999' on '7'

Step 2. Make the 1-to-many forest plot

Example 1. Effect of multiple risk factors on coronary heart disease

In this example we wish to plot results from an MR analysis of the effect of multiple risk factors on coronary heart disease, with results sorted by decreasing effect size (largest effect at the top of the plot) and with one MR method for each unique exposure-outcome combination.

## Warning: Removed 12 rows containing missing values (geom_vline).

It is also possible to add additional columns and column titles:

## [1] "nsnp"
## [1] "pval"
## Warning: Removed 12 rows containing missing values (geom_vline).

In my own workflow I prefer to to keep the plot free of axis and column titles:

## [1] "nsnp"
## [1] "pval"
## Warning: Removed 12 rows containing missing values (geom_vline).

I would then import the above plot into another program like powerpoint and add the axis/column titles there.

Example 2. Group results by MR method

In this next example we plot the results from an MR analysis of the effect of multiple risk factors on coronary heart disease, with results grouped by MR method. We also want the result for the IVW method to be given priority and to go above the other methods. We also want the trait with the largest IVW effect size to go the top of the plot.

## Analysing '1' on '7'
## Analysing '100' on '7'
## Analysing '104' on '7'
## Analysing '2' on '7'
## Analysing '72' on '7'
## Analysing '999' on '7'
## Warning: Removed 5 rows containing missing values (geom_vline).

## Warning: Removed 5 rows containing missing values (geom_vline).

## Warning: Removed 5 rows containing missing values (geom_vline).

## Warning: Removed 5 rows containing missing values (geom_vline).
## Warning: Removed 1 rows containing missing values (geom_vline).

## Warning: Removed 1 rows containing missing values (geom_vline).

To keep the plot legible it’s advisable not to plot too many results in a single plot (as can be seen in the above example the results for waist circumference and hip circumference can’t be seen). In my own workflow I divide such results across two separate plots:

## Warning: Removed 10 rows containing missing values (geom_vline).

## Warning: Removed 10 rows containing missing values (geom_vline).

## Warning: Removed 5 rows containing missing values (geom_vline).
## Warning: Removed 5 rows containing missing values (geom_vline).
## Warning: Removed 1 rows containing missing values (geom_vline).

## Warning: Removed 1 rows containing missing values (geom_vline).

You can then join the above plots together in powerpoint or some other program.

Somewhat unintuitvely we set the TraitM argument to “method” in the above example. This is because we are stratifying the results on trait names (specified in the “by” argument). Normally the row labels would correspond to trait names but in this example we want the row names to correspond to the MR method.

Example 3. Stratify results on a grouping variable

In this next example we plot the same results as above but with results stratified by a grouping variable. We also select one MR method for each unique exposure-outcome combination and sort the results by decreasing effect size within each group (i.e. largest effect at the top).

## Analysing '1' on '7'
## Analysing '100' on '7'
## Analysing '104' on '7'
## Analysing '2' on '7'
## Analysing '72' on '7'
## Analysing '999' on '7'
## Warning: Removed 6 rows containing missing values (geom_vline).

## Warning: Removed 6 rows containing missing values (geom_vline).

In the above example we made up an arbitrary grouping variable called “subcategory” with values “Group 1” and “Group 2”. Typically, however, the grouping variable might correspond to something like a trait ontology (e.g. anthropometric and glycemic traits) or study design (e.g. MR and observational studies).


MR.RAPS: Many weak instruments analysis

MR.RAPS (Robust Adjusted Profile Score) is a recently proposed method that considers the measurement error in SNP-exposure effects, is unbiased when there are many (e.g. hundreds of) weak instruments, and is robust to systematic and idiosyncratic pleiotropy. See the arXiv preprint for more detail about the statistical methodology.

MR.RAPS is implemented in the R package mr.raps that is available on CRAN. It can be directly called from TwoSampleMR by

MR.RAPS comes with two main options: over.dispersion (whether the method should consider systematic pleiotropy) and loss.function (either “l2”, “huber”, or “tukey”). The latter two loss functions are robust to idiosyncratic pleiotropy. The default option is over.dispersion = TRUE and loss.function = “tukey”. To change these options, modify the parameters argument of mr by (for example)


Reports

A report can be generated that performs all MR analyses, sensitivity analyses, and plots, and presents them in a single self-contained html web page, word document, or pdf document.

By default this produces a html file in the current working directory, but see the help pages on how to modify this.

This function will create a separate report file for every exposure-outcome combination that is present in the dat object.


MR Steiger directionality test

This is an implementation of the method described here:

Hemani G, Tilling K, Davey Smith G.
Orienting the causal relationship between imprecisely measured traits using GWAS summary data.
PLoS Genetics. 2017. 13(11).

In MR it is assumed that the instruments influence the exposure first and then the outcome through the exposure. But sometimes this is difficult to evaluate, for example is a cis-acting SNP influencing gene expression levels or DNA methylation levels first? The causal direction between the hypothesised exposure and outcomes can be tested using the Steiger test [reference to go here]. For example:

## r.exposure and/or r.outcome not present.
## Calculating approximate SNP-exposure and/or SNP-outcome correlations, assuming all are quantitative traits. Please pre-calculate r.exposure and/or r.outcome using get_r_from_lor() for any binary traits
## Estimating correlation for quantitative trait.
## This method is an approximation, and may be numerically unstable.
## Ideally you should estimate r directly from independent replication samples.
## Use get_r_from_lor for binary traits.
## Estimating correlation for quantitative trait.
## This method is an approximation, and may be numerically unstable.
## Ideally you should estimate r directly from independent replication samples.
## Use get_r_from_lor for binary traits.
id.exposure id.outcome exposure outcome snp_r2.exposure snp_r2.outcome correct_causal_direction steiger_pval
lnCcB5 7 BMI Coronary heart disease || id:7 0.0111681 0.0006293 TRUE 0

It calculates the variance explained in the exposure and the outcome by the instrumenting SNPs, and tests if the variance in the outcome is less than the exposure.

This test is, like many others, liable to give inaccurate causal directions under some measurement error parameters in the exposure and the outcome (e.g. if the outcome has much lower measurement precision then its proportion of variance explained will be underestimated). Sensitivity can be applied to evaluate the extent to which the inferred causal direction is liable to measurement error, in two ways.

  1. Provide estimates of measurement error for the exposure and the outcome, and obtain an adjusted estimate of the causal direction
  2. For all possible values of measurement error, identify the proportion of the parameter space which supports the inferred causal direction

These tests are obtained using:


Multivariable MR

When SNPs instrument multiple potential exposures, for example in the case of different lipid fractions, one method for overcoming this problem is to estimate the influence of each lipid conditioning on the effects of the SNPs on the other lipids. Multivariable MR can be performed using the R package as follows.

The GWAS IDs for HDL, LDL and total cholesterol are 299, 300 and 302. The GWAS ID for coronary heart disease (CHD) is 7. In this example we will estimate the multivariable effects of HDL, LDL and total cholesterol on CHD.

First obtain the instruments for each lipid fraction. This entails obtaining a combined set of SNPs including all instruments, and getting those SNPs for each lipid fraction. Therefore, if there are e.g. 20 instruments for each of 3 lipid fractions, but combined there are 30 unique SNPs, then we need to extract each of the 30 SNPs from each lipid fraction (exposure).

Next, also extract those SNPs from the outcome.

Once the data has been obtained, harmonise so that all are on the same reference allele.

Finally, perform the multivariable MR analysis

This generates a table of results.

Note about MV methods

There are several different ways in which this analysis can be formulated. e.g. consider 3 exposures against one outcome, one could:

  1. Fit all exposures together or fit one exposure at a time against the residuals of the outcome that has been adjusted for the other outcomes. The former is recommended by default in this R package through the mv_multiple function but the latter was how MV MR was originally described by Burgess et al 2015 and can be done with mv_residual.
  2. Fitting all instruments for all exposures (default) or only fitting the instruments for each exposure sequentially
  3. Forcing the slopes through the origin (default) or allowing an intercept term.

With these three different parameters there are eight different ways to do MV analysis. We recommend the default settings as described above.

Note about visualisation

Plots can be generated using the plots=TRUE argument for mv_multiple and mv_residual.

The current plots being generated are not necessarily adequate because while they show the slope through the raw points, they do not demonstrate that the raw points might be effectively different between plots because they are conditional on the other exposures.


MR estimates when instruments are correlated

In the examples shown so far it is assumed that instruments are independent (i.e. they are not in linkage disequilibrium, LD). This is to avoid ‘double counting’ effects. An alternative approach is to estimate the MR effects accounting for the correlation between variants.

The TwoSampleMR package has not implemented this yet, but the MendelianRandomization R package by Olena Yavorska and Stephen Burgess does have this functionality. We can use the TwoSampleMR package to extract, format and harmonise data, and then convert to the format required by the MendelianRandomization package. The MR-Base server has the individual level genetic data for ~500 Europeans in 1000 genomes data, and can obtain the LD matrix for a set of SNPs using these data. For example:

Here ld_matrix returns the LD correlation values (not R^2) for each pair of variants present in the 1000 genomes data set.

Convert to the MRInput format for the MendelianRandomization package:

This produces a list of MRInput objects that can be used with the MendelianRandomization functions, e.g.

Alternatively, convert to the MRInput format but also obtaining the LD matrix for the instruments


MR-MoE: Using a mixture of experts machine learning approach

We recently developed MR-MoE, a method to choose the most appropriate amongst several MR tests using a machine learning algorithm. Note that the method is still under review, but full details are described here: biorxiv.org/content/early/2017/08/23/173682.

MR-MoE operates by taking a set of harmonised data, inferring some characteristics about the dataset, and using those characteristics to predict how well each of the different MR methods will perform on the dataset, in terms of maximising power while minimising false discovery rates.

In order to run the analysis you must download an RData object that contains the trained random forests that are used to predict the efficacy of each method. This can be downloaded from here:

dropbox.com/s/k0grrhh0ak8er7q/rf.rdata?dl=0

Caution: this is a large file (approx 3Gb)

Once downloaded, read in the object and use the mr_moe function to perform the analysis. An example is shown here, estimating the causal effect of BMI on coronary heart disease:

The function does the following:

  1. Performs MR using each of 14 MR methods
  2. Applies Steiger filtering to remove SNPs that do not have substantially larger R^2 with the exposure than the outcome. Note - for binary traits ensure number of cases, number of controls, and allele frequencies are available for each SNP. For continuous traits make sure the p-value and sample size is available. The function infers if a trait is binary or continuous based on the units.exposure and units.outcome columns - binary traits must have those values set to ‘log odds’
  3. Performs the 14 MR methods again but using the subset of SNPs that survive Steiger filtering
  4. Generates meta data about the summary data to predict the most reliable of the 28 methods applied.

For every exposure / outcome combination in the dat object, the MR-MoE method is applied. The function returns a list which is as long as the number of exposure / outcome combinations. In this case, it will be of length 1, containing the result for BMI on CHD.

The result object itself is a list with the following elements:

  • m1 is the set of MR estimates applied using all instruments
  • m2 is the set of MR estimates applied to the SNPs after Steiger filtering
  • m3 is a collection of all m1 and m2 MR estimates.

Post MR results management

The TwoSampleMR package also provides the following functions for managing or editing MR results.

Split outcome names

The outcome column in the output of mr() combines the original outcome name with the outcome trait ID.

##    id.exposure id.outcome                        outcome
## 3            1          7 Coronary heart disease || id:7
## 6          100          7 Coronary heart disease || id:7
## 7          104          7 Coronary heart disease || id:7
## 15          72          7 Coronary heart disease || id:7
## 10           2          7 Coronary heart disease || id:7
## 20         999          7 Coronary heart disease || id:7
##               exposure                    method nsnp           b
## 3          Adiponectin Inverse variance weighted   14 -0.08598819
## 6    Hip circumference Inverse variance weighted    2 -0.18631007
## 7  Waist circumference                Wald ratio    1 -0.44629630
## 15  Waist-to-hip ratio Inverse variance weighted   30  0.47947406
## 10     Body mass index Inverse variance weighted   79  0.44590910
## 20            Body fat Inverse variance weighted   10  0.26645163
##            se         pval subcategory
## 3  0.07011418 2.200473e-01     Group 1
## 6  0.21290038 3.815171e-01     Group 1
## 7  0.34825185 2.000065e-01     Group 1
## 15 0.14733490 1.136665e-03     Group 2
## 10 0.05898302 4.032020e-14     Group 2
## 20 0.32944941 4.186425e-01     Group 2

The outcome column can be split into separate columns for the id and outcome name using the split_outcome function:

##    id.exposure id.outcome                outcome            exposure
## 3            1          7 Coronary heart disease         Adiponectin
## 6          100          7 Coronary heart disease   Hip circumference
## 7          104          7 Coronary heart disease Waist circumference
## 15          72          7 Coronary heart disease  Waist-to-hip ratio
## 10           2          7 Coronary heart disease     Body mass index
## 20         999          7 Coronary heart disease            Body fat
##                       method nsnp           b         se         pval
## 3  Inverse variance weighted   14 -0.08598819 0.07011418 2.200473e-01
## 6  Inverse variance weighted    2 -0.18631007 0.21290038 3.815171e-01
## 7                 Wald ratio    1 -0.44629630 0.34825185 2.000065e-01
## 15 Inverse variance weighted   30  0.47947406 0.14733490 1.136665e-03
## 10 Inverse variance weighted   79  0.44590910 0.05898302 4.032020e-14
## 20 Inverse variance weighted   10  0.26645163 0.32944941 4.186425e-01
##    subcategory
## 3      Group 1
## 6      Group 1
## 7      Group 1
## 15     Group 2
## 10     Group 2
## 20     Group 2

Split exposure names

Similarly to the outcome column, the exposure column in the output of mr() combines the original exposure name with the exposure trait ID. This can be split into separate columns for the id and exposure name using the split_exposure function.

Generate odds ratios with 95% confidence intervals

Users can convert log odds ratios into odds ratios with 95% confidence intervals using:

##    id.exposure id.outcome                        outcome
## 3            1          7 Coronary heart disease || id:7
## 6          100          7 Coronary heart disease || id:7
## 7          104          7 Coronary heart disease || id:7
## 15          72          7 Coronary heart disease || id:7
## 10           2          7 Coronary heart disease || id:7
## 20         999          7 Coronary heart disease || id:7
##               exposure                    method nsnp           b
## 3          Adiponectin Inverse variance weighted   14 -0.08598819
## 6    Hip circumference Inverse variance weighted    2 -0.18631007
## 7  Waist circumference                Wald ratio    1 -0.44629630
## 15  Waist-to-hip ratio Inverse variance weighted   30  0.47947406
## 10     Body mass index Inverse variance weighted   79  0.44590910
## 20            Body fat Inverse variance weighted   10  0.26645163
##            se         pval subcategory      lo_ci      up_ci        or
## 3  0.07011418 2.200473e-01     Group 1 -0.2234120 0.05143559 0.9176051
## 6  0.21290038 3.815171e-01     Group 1 -0.6035948 0.23097468 0.8300162
## 7  0.34825185 2.000065e-01     Group 1 -1.1288699 0.23627733 0.6399941
## 15 0.14733490 1.136665e-03     Group 2  0.1906976 0.76825047 1.6152247
## 10 0.05898302 4.032020e-14     Group 2  0.3303024 0.56151581 1.5619095
## 20 0.32944941 4.186425e-01     Group 2 -0.3792692 0.91217247 1.3053245
##     or_lci95 or_uci95
## 3  0.7997853 1.052781
## 6  0.5468423 1.259827
## 7  0.3233985 1.266526
## 15 1.2100935 2.155991
## 10 1.3913888 1.753328
## 20 0.6843614 2.489726

Subset on method

It is sometimes useful to subset results on MR method, so that there is one unique result for each exposure-outcome combination:

##    id.exposure id.outcome                        outcome
## 3            1          7 Coronary heart disease || id:7
## 6          100          7 Coronary heart disease || id:7
## 7          104          7 Coronary heart disease || id:7
## 15          72          7 Coronary heart disease || id:7
## 10           2          7 Coronary heart disease || id:7
## 20         999          7 Coronary heart disease || id:7
##               exposure                    method nsnp           b
## 3          Adiponectin Inverse variance weighted   14 -0.08598819
## 6    Hip circumference Inverse variance weighted    2 -0.18631007
## 7  Waist circumference                Wald ratio    1 -0.44629630
## 15  Waist-to-hip ratio Inverse variance weighted   30  0.47947406
## 10     Body mass index Inverse variance weighted   79  0.44590910
## 20            Body fat Inverse variance weighted   10  0.26645163
##            se         pval subcategory
## 3  0.07011418 2.200473e-01     Group 1
## 6  0.21290038 3.815171e-01     Group 1
## 7  0.34825185 2.000065e-01     Group 1
## 15 0.14733490 1.136665e-03     Group 2
## 10 0.05898302 4.032020e-14     Group 2
## 20 0.32944941 4.186425e-01     Group 2

The default is to subset on the IVW method when >1 SNP is available and to use the Wald ratio method when a single SNP is available. Users can specify which multi-SNP method to subset on.

Combine all results

It is often useful to combine all results and study level characterists into a single dataframe or table, e.g. for sharing results with collaborators or when the user wishes to present all results in a single table or figure. This can be done using the combine_all_mrresults() function:

## Analysing 'lnCcB5' on '7'
## Warning in mr_heterogeneity(dat): Prior to version 0.4.9 there was a bug
## in the IVW Q statistic estimate, leading to a slight underestimation in
## heterogeneity. This has now been resolved.
## Token cache file: mrbase.oauth
##    id.outcome                    Method id.exposure                outcome
## 1           7 Inverse variance weighted      lnCcB5 Coronary heart disease
## 2           7                  MR Egger      lnCcB5 Coronary heart disease
## 3           7               Simple mode      lnCcB5 Coronary heart disease
## 4           7                Wald ratio      lnCcB5 Coronary heart disease
## 5           7                Wald ratio      lnCcB5 Coronary heart disease
## 6           7                Wald ratio      lnCcB5 Coronary heart disease
## 7           7                Wald ratio      lnCcB5 Coronary heart disease
## 8           7                Wald ratio      lnCcB5 Coronary heart disease
## 9           7                Wald ratio      lnCcB5 Coronary heart disease
## 10          7                Wald ratio      lnCcB5 Coronary heart disease
## 11          7                Wald ratio      lnCcB5 Coronary heart disease
## 12          7                Wald ratio      lnCcB5 Coronary heart disease
## 13          7                Wald ratio      lnCcB5 Coronary heart disease
## 14          7                Wald ratio      lnCcB5 Coronary heart disease
## 15          7                Wald ratio      lnCcB5 Coronary heart disease
## 16          7                Wald ratio      lnCcB5 Coronary heart disease
## 17          7                Wald ratio      lnCcB5 Coronary heart disease
## 18          7                Wald ratio      lnCcB5 Coronary heart disease
## 19          7                Wald ratio      lnCcB5 Coronary heart disease
## 20          7                Wald ratio      lnCcB5 Coronary heart disease
## 21          7                Wald ratio      lnCcB5 Coronary heart disease
## 22          7                Wald ratio      lnCcB5 Coronary heart disease
## 23          7                Wald ratio      lnCcB5 Coronary heart disease
## 24          7                Wald ratio      lnCcB5 Coronary heart disease
## 25          7                Wald ratio      lnCcB5 Coronary heart disease
## 26          7                Wald ratio      lnCcB5 Coronary heart disease
## 27          7                Wald ratio      lnCcB5 Coronary heart disease
## 28          7                Wald ratio      lnCcB5 Coronary heart disease
## 29          7                Wald ratio      lnCcB5 Coronary heart disease
## 30          7                Wald ratio      lnCcB5 Coronary heart disease
## 31          7           Weighted median      lnCcB5 Coronary heart disease
## 32          7             Weighted mode      lnCcB5 Coronary heart disease
##    exposure nsnp            b         se         pval        Q Q_df
## 1       BMI   27  0.113916838 0.01611170 1.544411e-12 39.68703   26
## 2       BMI   27  0.113856621 0.03292752 1.962323e-03 39.68702   25
## 3       BMI   27  0.075103310 0.03916126 6.618378e-02       NA   NA
## 4       BMI    1  0.160623077 0.09240846 8.217808e-02       NA   NA
## 5       BMI    1  0.377176471 0.12765706 3.130673e-03       NA   NA
## 6       BMI    1  0.140057895 0.05720105 1.434447e-02       NA   NA
## 7       BMI    1  0.170033333 0.05193611 1.060763e-03       NA   NA
## 8       BMI    1  0.217227273 0.09313364 1.967832e-02       NA   NA
## 9       BMI    1  0.076420513 0.02465846 1.940703e-03       NA   NA
## 10      BMI    1 -0.047370588 0.08782882 5.896445e-01       NA   NA
## 11      BMI    1 -0.194550000 0.12596500 1.224729e-01       NA   NA
## 12      BMI    1  0.041933333 0.15620667 7.883547e-01       NA   NA
## 13      BMI    1  0.260353333 0.08578667 2.406212e-03       NA   NA
## 14      BMI    1  0.198700000 0.18209667 2.751943e-01       NA   NA
## 15      BMI    1  0.058550000 0.09658100 5.443641e-01       NA   NA
## 16      BMI    1  0.109253846 0.08202231 1.828597e-01       NA   NA
## 17      BMI    1  0.075516667 0.16382833 6.448345e-01       NA   NA
## 18      BMI    1  0.080015385 0.07568000 2.903812e-01       NA   NA
## 19      BMI    1  0.117345161 0.03919355 2.753534e-03       NA   NA
## 20      BMI    1  0.044733333 0.15153556 7.678409e-01       NA   NA
## 21      BMI    1  0.034545455 0.05331636 5.170280e-01       NA   NA
## 22      BMI    1  0.005916667 0.15897667 9.703118e-01       NA   NA
## 23      BMI    1  0.186522222 0.12302667 1.294910e-01       NA   NA
## 24      BMI    1  0.036766667 0.15530167 8.128558e-01       NA   NA
## 25      BMI    1  0.219600000 0.11139800 4.868842e-02       NA   NA
## 26      BMI    1  0.250947826 0.04565652 3.875592e-08       NA   NA
## 27      BMI    1  0.139171429 0.06628643 3.576877e-02       NA   NA
## 28      BMI    1  0.068125000 0.07895417 3.882241e-01       NA   NA
## 29      BMI    1 -0.017985714 0.08753071 8.371980e-01       NA   NA
## 30      BMI    1  0.194192308 0.08915000 2.938633e-02       NA   NA
## 31      BMI   27  0.079404175 0.02045798 1.038869e-04       NA   NA
## 32      BMI   27  0.088753773 0.02055830 2.036362e-04       NA   NA
##        Q_pval samplesize        SNP author category        consortium
## 1  0.04185786         NA       <NA> Nikpay  Disease CARDIoGRAMplusC4D
## 2  0.03139634         NA       <NA> Nikpay  Disease CARDIoGRAMplusC4D
## 3          NA         NA       <NA> Nikpay  Disease CARDIoGRAMplusC4D
## 4          NA     184305 rs10150332 Nikpay  Disease CARDIoGRAMplusC4D
## 5          NA     184305 rs11847697 Nikpay  Disease CARDIoGRAMplusC4D
## 6          NA     184305 rs10767664 Nikpay  Disease CARDIoGRAMplusC4D
## 7          NA     184305 rs10938397 Nikpay  Disease CARDIoGRAMplusC4D
## 8          NA     184305 rs10968576 Nikpay  Disease CARDIoGRAMplusC4D
## 9          NA     184305  rs1558902 Nikpay  Disease CARDIoGRAMplusC4D
## 10         NA     184305 rs12444979 Nikpay  Disease CARDIoGRAMplusC4D
## 11         NA     184305 rs13078807 Nikpay  Disease CARDIoGRAMplusC4D
## 12         NA     184305  rs1555543 Nikpay  Disease CARDIoGRAMplusC4D
## 13         NA     184305  rs2287019 Nikpay  Disease CARDIoGRAMplusC4D
## 14         NA     184305   rs206936 Nikpay  Disease CARDIoGRAMplusC4D
## 15         NA     184305  rs2112347 Nikpay  Disease CARDIoGRAMplusC4D
## 16         NA     184305  rs2241423 Nikpay  Disease CARDIoGRAMplusC4D
## 17         NA     184305    rs29941 Nikpay  Disease CARDIoGRAMplusC4D
## 18         NA     184305  rs2815752 Nikpay  Disease CARDIoGRAMplusC4D
## 19         NA     184305  rs2867125 Nikpay  Disease CARDIoGRAMplusC4D
## 20         NA     184305  rs2890652 Nikpay  Disease CARDIoGRAMplusC4D
## 21         NA     184305   rs543874 Nikpay  Disease CARDIoGRAMplusC4D
## 22         NA     184305  rs3817334 Nikpay  Disease CARDIoGRAMplusC4D
## 23         NA     184305  rs4771122 Nikpay  Disease CARDIoGRAMplusC4D
## 24         NA     184305  rs4929949 Nikpay  Disease CARDIoGRAMplusC4D
## 25         NA     184305   rs887912 Nikpay  Disease CARDIoGRAMplusC4D
## 26         NA     184305   rs571312 Nikpay  Disease CARDIoGRAMplusC4D
## 27         NA     184305   rs713586 Nikpay  Disease CARDIoGRAMplusC4D
## 28         NA     184305  rs7138803 Nikpay  Disease CARDIoGRAMplusC4D
## 29         NA     184305  rs9816226 Nikpay  Disease CARDIoGRAMplusC4D
## 30         NA     184305   rs987237 Nikpay  Disease CARDIoGRAMplusC4D
## 31         NA         NA       <NA> Nikpay  Disease CARDIoGRAMplusC4D
## 32         NA         NA       <NA> Nikpay  Disease CARDIoGRAMplusC4D
##    ncase ncontrol nsnps.outcome.array     pmid population sample_size
## 1  60801   123504             9455779 26343387      Mixed      184305
## 2  60801   123504             9455779 26343387      Mixed      184305
## 3  60801   123504             9455779 26343387      Mixed      184305
## 4  60801   123504             9455779 26343387      Mixed      184305
## 5  60801   123504             9455779 26343387      Mixed      184305
## 6  60801   123504             9455779 26343387      Mixed      184305
## 7  60801   123504             9455779 26343387      Mixed      184305
## 8  60801   123504             9455779 26343387      Mixed      184305
## 9  60801   123504             9455779 26343387      Mixed      184305
## 10 60801   123504             9455779 26343387      Mixed      184305
## 11 60801   123504             9455779 26343387      Mixed      184305
## 12 60801   123504             9455779 26343387      Mixed      184305
## 13 60801   123504             9455779 26343387      Mixed      184305
## 14 60801   123504             9455779 26343387      Mixed      184305
## 15 60801   123504             9455779 26343387      Mixed      184305
## 16 60801   123504             9455779 26343387      Mixed      184305
## 17 60801   123504             9455779 26343387      Mixed      184305
## 18 60801   123504             9455779 26343387      Mixed      184305
## 19 60801   123504             9455779 26343387      Mixed      184305
## 20 60801   123504             9455779 26343387      Mixed      184305
## 21 60801   123504             9455779 26343387      Mixed      184305
## 22 60801   123504             9455779 26343387      Mixed      184305
## 23 60801   123504             9455779 26343387      Mixed      184305
## 24 60801   123504             9455779 26343387      Mixed      184305
## 25 60801   123504             9455779 26343387      Mixed      184305
## 26 60801   123504             9455779 26343387      Mixed      184305
## 27 60801   123504             9455779 26343387      Mixed      184305
## 28 60801   123504             9455779 26343387      Mixed      184305
## 29 60801   123504             9455779 26343387      Mixed      184305
## 30 60801   123504             9455779 26343387      Mixed      184305
## 31 60801   123504             9455779 26343387      Mixed      184305
## 32 60801   123504             9455779 26343387      Mixed      184305
##                  sex    subcategory                  trait year
## 1  Males and females Cardiovascular Coronary heart disease 2015
## 2  Males and females Cardiovascular Coronary heart disease 2015
## 3  Males and females Cardiovascular Coronary heart disease 2015
## 4  Males and females Cardiovascular Coronary heart disease 2015
## 5  Males and females Cardiovascular Coronary heart disease 2015
## 6  Males and females Cardiovascular Coronary heart disease 2015
## 7  Males and females Cardiovascular Coronary heart disease 2015
## 8  Males and females Cardiovascular Coronary heart disease 2015
## 9  Males and females Cardiovascular Coronary heart disease 2015
## 10 Males and females Cardiovascular Coronary heart disease 2015
## 11 Males and females Cardiovascular Coronary heart disease 2015
## 12 Males and females Cardiovascular Coronary heart disease 2015
## 13 Males and females Cardiovascular Coronary heart disease 2015
## 14 Males and females Cardiovascular Coronary heart disease 2015
## 15 Males and females Cardiovascular Coronary heart disease 2015
## 16 Males and females Cardiovascular Coronary heart disease 2015
## 17 Males and females Cardiovascular Coronary heart disease 2015
## 18 Males and females Cardiovascular Coronary heart disease 2015
## 19 Males and females Cardiovascular Coronary heart disease 2015
## 20 Males and females Cardiovascular Coronary heart disease 2015
## 21 Males and females Cardiovascular Coronary heart disease 2015
## 22 Males and females Cardiovascular Coronary heart disease 2015
## 23 Males and females Cardiovascular Coronary heart disease 2015
## 24 Males and females Cardiovascular Coronary heart disease 2015
## 25 Males and females Cardiovascular Coronary heart disease 2015
## 26 Males and females Cardiovascular Coronary heart disease 2015
## 27 Males and females Cardiovascular Coronary heart disease 2015
## 28 Males and females Cardiovascular Coronary heart disease 2015
## 29 Males and females Cardiovascular Coronary heart disease 2015
## 30 Males and females Cardiovascular Coronary heart disease 2015
## 31 Males and females Cardiovascular Coronary heart disease 2015
## 32 Males and females Cardiovascular Coronary heart disease 2015
##       intercept intercept_se intercept_pval
## 1            NA           NA             NA
## 2  1.110681e-05  0.005263163       0.998333
## 3            NA           NA             NA
## 4            NA           NA             NA
## 5            NA           NA             NA
## 6            NA           NA             NA
## 7            NA           NA             NA
## 8            NA           NA             NA
## 9            NA           NA             NA
## 10           NA           NA             NA
## 11           NA           NA             NA
## 12           NA           NA             NA
## 13           NA           NA             NA
## 14           NA           NA             NA
## 15           NA           NA             NA
## 16           NA           NA             NA
## 17           NA           NA             NA
## 18           NA           NA             NA
## 19           NA           NA             NA
## 20           NA           NA             NA
## 21           NA           NA             NA
## 22           NA           NA             NA
## 23           NA           NA             NA
## 24           NA           NA             NA
## 25           NA           NA             NA
## 26           NA           NA             NA
## 27           NA           NA             NA
## 28           NA           NA             NA
## 29           NA           NA             NA
## 30           NA           NA             NA
## 31           NA           NA             NA
## 32           NA           NA             NA

This combines all results from mr(), mr_heterogeneity(), mr_pleiotropy_test() and mr_singlesnp() into a single dataframe. It also merges the results with outcome study level characteristics from the available_outcomes() function, including sample size characteristics. If requested, it also exponentiates results (e.g. if the user wants log odds ratio converted into odds ratios with 95 percent confidence intervals). The exposure and outcome columns from the output from mr() contain both the trait names and trait ids. The combine_all_mrresults() function splits these into separate columns by default.

References


Bowden, Jack, George Davey Smith, and Stephen Burgess. 2015. “Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression.” International Journal of Epidemiology In press.

Davey Smith, G., and S. Ebrahim. 2003. “’Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease?” International Journal of Epidemiology 32 (1): 1–22. https://doi.org/10.1093/ije/dyg070.

Davey Smith, George, and Gibran Hemani. 2014. “Mendelian randomization: genetic anchors for causal inference in epidemiological studies.” Human Molecular Genetics 23 (R1). Oxford Univ Press: R89—–R98. https://doi.org/10.1093/hmg/ddu328.

Pierce, Brandon L, and Stephen Burgess. 2013. “Efficient design for Mendelian randomization studies: subsample and 2-sample instrumental variable estimators.” American Journal of Epidemiology 178 (7): 1177–84. https://doi.org/10.1093/aje/kwt084.